Smoothness-Adaptive Contextual Bandits

نویسندگان

چکیده

We study a non-parametric multi-armed bandit problem with stochastic covariates, where key complexity driver is the smoothness of payoff functions respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases it priori known how smooth are. In practice, however, typically not advance, and misspecification may severely deteriorate performance existing methods. this work, we consider framework known, when adapt unknown smoothness. First, establish that designing is, general, impossible. However, under self-similarity condition (which does reduce minimax dynamic optimization at hand), adapting possible, further devise general policy for achieving smoothness-adaptive performance. Our infers payoffs throughout decision-making process, while leveraging structure off-the-shelf non-adaptive policies. settings either differentiable or non-differentiable functions, matches (up logarithmic scale) regret rate achievable priori.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unimodal Bandits without Smoothness

We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward is a continuous and unimodal function of the arm. No further assumption is made regarding thesmoothness and the structure of the expected reward function. We propose Stochastic Pentachotomy(SP), an algorithm for which we derive finite-time regret upper bounds. In particular, we sho...

متن کامل

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

We study the off-policy evaluation problem— estimating the value of a target policy using data collected by another policy—under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) an...

متن کامل

Kernalized Collaborative Contextual Bandits

We tackle the problem of recommending products in the online recommendation scenario, which occurs many times in real applications. The most famous and explored instances are news recommendations and advertisements. In this work we propose an extension to the state of the art Bandit models to not only take care of different users’ interactions, but also to go beyond the linearity assumption of ...

متن کامل

Contextual Dueling Bandits

We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner’s goal is to find the best policy, or way of behaving, in some space of policies, although “be...

متن کامل

Conservative Contextual Linear Bandits

Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields includin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Social Science Research Network

سال: 2021

ISSN: ['1556-5068']

DOI: https://doi.org/10.2139/ssrn.3893198